Skip to main content

Last Update: 2025/3/26

Qwen Audio Transcription API

The Qwen Audio Transcription API allows you to convert audio into text using OpenAI's SDK. This document provides an overview of the API endpoints, request parameters, and response structure.

Endpoint

POST https://platform.llmprovider.ai/v1/audio/transcriptions

Request Headers

HeaderValue
AuthorizationBearer YOUR_API_KEY
Content-Typemultipart/form-data

Request Body

ParameterTypeDescription
filefileThe audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm. file maxsize <= 20M
modelstringID of the model to use (e.g., paraformer-v2).
promptstring(Optional) Text to guide the model's style or continue a previous audio segment.
response_formatstring(Optional) The format of the transcript output (json, text, srt, verbose_json, or vtt). Default is json.
temperaturenumber(Optional) The sampling temperature, between 0 and 1. Default is 0.
languagestring(Optional) The language of the input audio (e.g., en, es, fr).
timestamp_granularities[]array(Optional) The timestamp granularities to populate for this transcription.

Response Body

The transcription object or a verbose transcription object.

The transcription object(JSON)

ParameterTypeDescription
textstringThe transcribed text.
{
"text": "Hello, this is the transcribed text from the audio file."
}

The transcription object (Verbose JSON)

ParameterTypeDescription
taskstringThe task performed by the model.
languagestringThe language of the input audio.
durationnumberThe duration of the audio in seconds.
segmentsarraySegments of the transcribed text and their corresponding details.
textstringThe transcribed text.
wordsarrayExtracted words and their corresponding timestamps.
{
"task": "transcribe",
"language": "en",
"duration": 2.95,
"segments": [
{
"id": 0,
"seek": 0,
"start": 0.0,
"end": 2.95,
"text": "Hello, this is the transcribed text from the audio file.",
"tokens": [
50364,
2425,
11,
359,
307,
1161,
1123,
422,
264,
1467,
1780
],
"temperature": 0.0,
"avg_logprob": -0.458,
"compression_ratio": 0.688,
"no_speech_prob": 0.0192
}
],
"text": "Hello, this is the transcribed text from the audio file."
}

Example Request

curl -X POST https://platform.llmprovider.ai/v1/audio/transcriptions \
-H "Authorization: Bearer $YOUR_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F file="@audio.mp3" \
-F model="paraformer-v2"

For any questions or further assistance, please contact us at [email protected].